Search CORE

79 research outputs found

Predicting residue-wise contact orders in proteins by support vector regression

Author: A Bairoch
AG Murzin
AR Kinjo
AR Kinjo
AR Kinjo
AR Kinjo
B Rost
CH Tsai
D Kihara
D Sarda
DT Jones
G Pollastri
G Pollastri
GP Raghava
HM Berman
J Song
J Wang
Jiangning Song
JM Chandonia
Kevin Burrage
KW Plaxco
M Punta
MPS Brown
NP Prabhu
S Ahmad
S Hua
S Hua
V Vapnik
V Vapnik
W Kabsch
W Liu
X Wang
Z Yuan
Z Yuan
Z Yuan
Z Yuan
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. RESULTS: We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. CONCLUSION: The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Queensland University of Technology ePrints Archive

University of Queensland eSpace

Recommended from our members

Inverse analysis of critical current density in a bulk high-temperature superconducting undulator

Author: Ainslie MD
Calvi M
Dennis AR
Durrell JH
Hellmann S
Kinjo R
Liang X
Schmidt T
Zhang K
Publication venue: Physical Review Accelerators and Beams
Publication date: 01/03/2022
Field of study

In order to optimise the design of undulators using high-temperature superconductor (HTS) bulks we have developed a method to estimate the critical current density (Jc) of each bulk from the overall measured magnetic field of an undulator. The vertical magnetic field was measured along the electron-beam axis in a HTS bulk-based undulator consisting of twenty Gd-Ba-Cu-O (GdBCO) bulks inserted in a 12-T solenoid. The Jc values of the bulks were estimated by an inverse analysis approach in which the magnetic field was calculated by the forward simulation of the shielding currents in each HTS bulk with a given Jc. Subsequently the Jc values were iteratively updated using the pre-calculated response matrix of the undulator magnetic field to Jc. We demonstrate that it is possible to determine the Jc of each HTS bulk with sufficient accuracy for practical application within around 10 iterations. The pre-calculated response matrix, created in advance, enables the inverse analysis to be performed within a practically short time, on the order of several hours. The measurement error, which destroys the uniqueness of the solution, was investigated and the points to be noted for future magnetic field measurements were clarified. The results show that this inverse-analysis method allows the estimation of the Jc of each bulk comprising an HTS bulk undulator.CHART (Swiss Accelerator Research and Technology Collaboration); EPSRC Early Career Fellowship, EP/P020313/

Apollo (Cambridge)

Nature of protein family signatures: Insights from singular value analysis of position-specific scoring matrices

Author: A Bundi
A Kidera
AG Murzin
Akira R. Kinjo
AR Kinjo
AR Kinjo
AR Kinjo
AR Kinjo
AR Kinjo
AR Knjo
B Qian
B Rost
BE Suzek
C Barber
C Rosano
D Bashford
David Jones
DT Jones
DT Jones
F Beghin
FM Richards
G Wang
Haruki Nakamura
HM Berman
J Kyte
JL Fauchère
JO Wrabl
JT Lecomte
JU Bowie
JU Bowie
K Nakai
K Nishikawa
K Nishikawa
K Tomii
M Charton
M Gribskov
M Kann
M Levitt
M Oobatake
M Ota
M Ota
M Porto
MG Rudolph
MO Dayhoff
P Klein
P Koehl
P Pokarowski
PHA Sneath
R Aurora
R Durbin
R Grantham
RA Horn
RD Finn
RF Doolittle
RM Sweet
S Fukuchi
S Henikoff
S Kawashima
S Miyazawa
SF Altschul
SF Altschul
SR Eddy
T Ishida
TM Cover
U Bastolla
WE Royer Jr
WR Taylor
Z Yuan
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 07/11/2007
Field of study

Position-specific scoring matrices (PSSMs) are useful for detecting weak homology in protein sequence analysis, and they are thought to contain some essential signatures of the protein families. In order to elucidate what kind of ingredients constitute such family-specific signatures, we apply singular value decomposition to a set of PSSMs and examine the properties of dominant right and left singular vectors. The first right singular vectors were correlated with various amino acid indices including relative mutability, amino acid composition in protein interior, hydropathy, or turn propensity, depending on proteins. A significant correlation between the first left singular vector and a measure of site conservation was observed. It is shown that the contribution of the first singular component to the PSSMs act to disfavor potentially but falsely functionally important residues at conserved sites. The second right singular vectors were highly correlated with hydrophobicity scales, and the corresponding left singular vectors with contact numbers of protein structures. It is suggested that sequence alignment with a PSSM is essentially equivalent to threading supplemented with functional information. The presented method may be used to separate functionally important sites from structurally important ones, and thus it may be a useful tool for predicting protein functions.Comment: 22 pages, 7 figures, 4 table

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Composite structural motifs of binding sites for delineating biological functions of proteins

Author: A Bairoch
A Fiorillo
A Rausell
A Stark
AC Joerger
AC Wallace
AG Murzin
Akira R. Kinjo
AM Schnoes
AR Kinjo
AR Kinjo
AR Kinjo
B Bollobás
B Dasgupta
B Louie
B Rost
BH Dessailly
C Branden
C Winter
CV Robinson
D Petrey
DJ Schuller
DM Chipman
E Krissinel
E Toyota
FP Davis
FP Davis
GM Santos
H Berman
H Kettenberger
Haruki Nakamura
I Friedberg
J Janin
J Shi
J Westbrook
JI Yeh
K Chen
K Henrick
K Kinoshita
K Kinoshita
K Kinoshita
K Okazaki
K Stenberg
L Xie
M Bashton
M Brylinski
M Kitayner
M Levitt
M Moertl
M Nardini
M Tyagi
M Yang
N Nagano
N Tuncbag
N Tuncbag
N Zhao
ND Gold
O Keskin
O Keskin
OC Redfern
Ozlem Keskin
P Cramer
P Shannon
PD Pawelek
R Koike
R Koike
R Rentzsch
R Sinha
RR Thangudu
S Kadono
SF Altschul
T Amemiya
T Kawabata
T Kawabata
TA Holland
TC Terwilliger
Y Loewenstein
Z Aung
ZX Xia
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2011
Field of study

Most biological processes are described as a series of interactions between proteins and other molecules, and interactions are in turn described in terms of atomic structures. To annotate protein functions as sets of interaction states at atomic resolution, and thereby to better understand the relation between protein interactions and biological functions, we conducted exhaustive all-against-all atomic structure comparisons of all known binding sites for ligands including small molecules, proteins and nucleic acids, and identified recurring elementary motifs. By integrating the elementary motifs associated with each subunit, we defined composite motifs which represent context-dependent combinations of elementary motifs. It is demonstrated that function similarity can be better inferred from composite motif similarity compared to the similarity of protein sequences or of individual binding sites. By integrating the composite motifs associated with each protein function, we define meta-composite motifs each of which is regarded as a time-independent diagrammatic representation of a biological process. It is shown that meta-composite motifs provide richer annotations of biological processes than sequence clusters. The present results serve as a basis for bridging atomic structures to higher-order biological phenomena by classification and integration of binding site structures.Comment: 34 pages, 7 figure

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Publication of nuclear magnetic resonance experimental data with semantic web technology and the application thereof to biomedical research of proteins

Author: AP Joseph
AR Kinjo
EL Ulrich
F Belleau
L Wang
N Juty
N Spadaccini
RI Wakefield
SR Hall
SR Hall
W Lee
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Selective Constraints on Amino Acids Estimated by a Mechanistic Codon Substitution Model with Multiple Nucleotide Changes

Author: A Doron-Faigenboim
A Schneider
AL Halpern
AR Kinjo
C Kosiol
Darren Martin
DT Jones
G Bazykin
GC Conant
H Akaike
I Keller
J Adachi
J Adachi
JP Huelsenbeck
K Tamura
L Jin
M Anisimova
M Averof
M Hasegawa
M Kimura
MA Larkin
MO Dayhoff
MW Dimmic
N Goldman
N Rodrigue
N Takahata
NGC Smith
R Grantham
S Guindon
S Miyazawa
S Whelan
S Whelan
S Whelan
Sanzo Miyazawa
SC Choi
SQ Le
SV Muse
T Miyata
T Miyata
TK Seo
TK Seo
W Delport
W Delport
Z Yang
Z Yang
Z Yang
Z Yang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 18/03/2011
Field of study

Empirical substitution matrices represent the average tendencies of substitutions over various protein families by sacrificing gene-level resolution. We develop a codon-based model, in which mutational tendencies of codon, a genetic code, and the strength of selective constraints against amino acid replacements can be tailored to a given gene. First, selective constraints averaged over proteins are estimated by maximizing the likelihood of each 1-PAM matrix of empirical amino acid (JTT, WAG, and LG) and codon (KHG) substitution matrices. Then, selective constraints specific to given proteins are approximated as a linear function of those estimated from the empirical substitution matrices. Akaike information criterion (AIC) values indicate that a model allowing multiple nucleotide changes fits the empirical substitution matrices significantly better. Also, the ML estimates of transition-transversion bias obtained from these empirical matrices are not so large as previously estimated. The selective constraints are characteristic of proteins rather than species. However, their relative strengths among amino acid pairs can be approximated not to depend very much on protein families but amino acid pairs, because the present model, in which selective constraints are approximated to be a linear function of those estimated from the JTT/WAG/LG/KHG matrices, can provide a good fit to other empirical substitution matrices including cpREV for chloroplast proteins and mtREV for vertebrate mitochondrial proteins. The present codon-based model with the ML estimates of selective constraints and with adjustable mutation rates of nucleotide would be useful as a simple substitution model in ML and Bayesian inferences of molecular phylogenetic trees, and enables us to obtain biologically meaningful information at both nucleotide and amino acid levels from codon and protein sequences.Comment: Table 9 in this article includes corrections for errata in the Table 9 published in 10.1371/journal.pone.0017244. Supporting information is attached at the end of the article, and a computer-readable dataset of the ML estimates of selective constraints is available from 10.1371/journal.pone.001724

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

A Didactic Model of Macromolecular Crowding Effects on Protein Folding

Author: A Emperador
A Kudlay
A Samiotakis
AC Ferreon
Allen P. Minton
Andreas Hofmann
AP Minton
AP Minton
AP Minton
AP Minton
AP Minton
AR Kinjo
AR Kinjo
BJ Alder
CH Davis
D Homouz
D Homouz
D Ridgway
D Tsao
DL Pincus
Douglas Tsao
DP Goldenberg
E Rivera
F Ding
F Ding
F Ding
GI Makhatadze
H Reiss
H-X Zhou
HC Andersen
HX Zhou
HX Zhou
J Batra
J Batra
J Han
J Mittal
JK Cheung
JL Lebowitz
L Stagg
MA Cotter
MS Cheung
MS Cheung
N Tokuriki
Nikolay V. Dokholyan
NV Dokholyan
NV Dokholyan
S Asakura
S Qin
S Qin
S Sharma
S Sharma
SB Zimmerman
SB Zimmerman
SR McGuffee
T Boublík
VK Shen
Y Zhou
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

A didactic model is presented to illustrate how the effect of macromolecular crowding on protein folding and association is modeled using current analytical theory and discrete molecular dynamics. While analytical treatments of crowding may consider the effect as a potential of average force acting to compress a polypeptide chain into a compact state, the use of simulations enables the presence of crowding reagents to be treated explicitly. Using an analytically solvable toy model for protein folding, an approximate statistical thermodynamic method is directly compared to simulation in order to gauge the effectiveness of current analytical crowding descriptions. Both methodologies are in quantitative agreement under most conditions, indication that both current theory and simulation methods are capable of recapitulating aspects of protein folding even by utilizing a simplistic protein model

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Carolina Digital Repository

svmPRAT: SVM-based Protein Residue Annotation Toolkit

Author: A Kernytsky
AG de Brevern
AG Murzin
AK Dunker
AR Kinjo
B Rost
C Etchebest
C Kauffman
Christopher Kauffman
DT Jones
DT Jones
G Karypis
G Pollastri
G Pollastri
GE Crooks
George Karypis
H Rangwala
Huzefa Rangwala
J Cheng
J Cheng
M Gribskov
O Noivirit-Brik
R Ahmed
R Karchin
R Sanchez
RC Whaley
S Ahmad
S Hirose
SF Altschul
T Joachims
T Schwede
V Vapnik
VN Vapnik
W Kabsch
Y Ofran
Z Dosztnyi
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Over the last decade several prediction methods have been developed for determining the structural and functional properties of individual protein residues using sequence and sequence-derived information. Most of these methods are based on support vector machines as they provide accurate and generalizable prediction models. Results We present a general purpose protein residue annotation toolkit (<it>svm</it><monospace>PRAT</monospace>) to allow biologists to formulate residue-wise prediction problems. <it>svm</it><monospace>PRAT</monospace> formulates the annotation problem as a classification or regression problem using support vector machines. One of the key features of <it>svm</it><monospace>PRAT</monospace> is its ease of use in incorporating any user-provided information in the form of feature matrices. For every residue <it>svm</it><monospace>PRAT</monospace> captures local information around the reside to create fixed length feature vectors. <it>svm</it><monospace>PRAT</monospace> implements accurate and fast kernel functions, and also introduces a flexible window-based encoding scheme that accurately captures signals and pattern for training effective predictive models. Conclusions In this work we evaluate <it>svm</it><monospace>PRAT</monospace> on several classification and regression problems including disorder prediction, residue-wise contact order estimation, DNA-binding site prediction, and local structure alphabet prediction. <it>svm</it><monospace>PRAT</monospace> has also been used for the development of state-of-the-art transmembrane helix prediction method called TOPTMH, and secondary structure prediction method called YASSPP. This toolkit developed provides practitioners an efficient and easy-to-use tool for a wide variety of annotation problems. <it>Availability</it>: <url>http://www.cs.gmu.edu/~mlbio/svmprat</url></p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Prodepth: Predict Residue Depth by Support Vector Regression Approach from Protein Sequences Only

Author: A Pintar
A Pintar
A Pintar
A Schlessinger
A Schlessinger
A Schlessinger
A Schlessinger
A Shrake
AG Murzin
AR Kinjo
AR Kinjo
Ashley M. Buckle
B Lee
B Rost
B Rost
B Rost
C Chothia
CK Smith
D Baker
D Varrazzo
D Xie
DT Jones
DT Jones
E Schmitt
EM Marcotte
F Ferre
G Pollastri
Geoffrey I. Webb
GP Raghava
H Chen
H Zhang
H Zhou
Hao Tan
HM Berman
J Cheng
J Cheng
J Qiu
J Song
J Song
J Song
J Song
J Wan
James C. Whisstock
JC Whisstock
Jiangning Song
JJ Ward
JM Chandonia
JU Bowie
K Bajaj
K Chen
K Vlahovicek
Khalid Mahmood
L Kurgan
LA Kurgan
M Connolly
M Kumar
M Lee
M Stout
ME Lacombe-Harvey
MK Kalita
MN Nguyen
O Schueler-Furman
P Radivojac
RG Coleman
Ruby H. P. Law
S Ahmad
S Chakravarty
S Liu
S Miller
Sean David Mooney
SF Altschul
T Hamelryck
T Ishida
T Joachims
T Noguchi
Tatsuya Akutsu
TL Blundell
V Vapnik
V Vapnik
W Kabsch
W Liu
W Zhang
WL DeLano
X Wang
Y Bromberg
Y Kalidas
Y Ofran
Y Ofran
Z Yuan
Z Yuan
ZX Wang
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Residue depth (RD) is a solvent exposure measure that complements the information provided by conventional accessible surface area (ASA) and describes to what extent a residue is buried in the protein structure space. Previous studies have established that RD is correlated with several protein properties, such as protein stability, residue conservation and amino acid types. Accurate prediction of RD has many potentially important applications in the field of structural bioinformatics, for example, facilitating the identification of functionally important residues, or residues in the folding nucleus, or enzyme active sites from sequence information. In this work, we introduce an efficient approach that uses support vector regression to quantify the relationship between RD and protein sequence. We systematically investigated eight different sequence encoding schemes including both local and global sequence characteristics and examined their respective prediction performances. For the objective evaluation of our approach, we used 5-fold cross-validation to assess the prediction accuracies and showed that the overall best performance could be achieved with a correlation coefficient (CC) of 0.71 between the observed and predicted RD values and a root mean square error (RMSE) of 1.74, after incorporating the relevant multiple sequence features. The results suggest that residue depth could be reliably predicted solely from protein primary sequences: local sequence environments are the major determinants, while global sequence features could influence the prediction performance marginally. We highlight two examples as a comparison in order to illustrate the applicability of this approach. We also discuss the potential implications of this new structural parameter in the field of protein structure prediction and homology modeling. This method might prove to be a powerful tool for sequence analysis

CiteSeerX

Public Library of Science (PLOS)

Crossref

PubMed Central

University of Melbourne Institutional Repository

Interleukin-12p40 Modulates Human Metapneumovirus-Induced Pulmonary Disease in an Acute Mouse Model of Infection

The mechanisms that regulate the host immune response induced by human metapneumovirus (hMPV), a newly-recognized member of the Paramyxoviridae family, are largely unknown. Cytokines play an important role in modulating inflammatory responses during viral infections. IL-12p40, a known important mediator in limiting lung inflammation, is induced by hMPV and its production is sustained after the resolution phase of infection suggesting that this cytokine plays a role in the immune response against hMPV. In this work, we demonstrated that in mice deficient in IL-12p40, hMPV infection induced an exacerbated pulmonary inflammatory response and mucus production, altered cytokine response, and decreased lung function. However, hMPV infection in these mice does not have an effect on viral replication. These results identify an important regulatory role of IL-12p40 in hMPV infection

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Louisiana State University